272 research outputs found

    Diffusion-Based Audio Inpainting

    Full text link
    Audio inpainting aims to reconstruct missing segments in corrupted recordings. Previous methods produce plausible reconstructions when the gap length is shorter than about 100\;ms, but the quality decreases for longer gaps. This paper explores recent advancements in deep learning and, particularly, diffusion models, for the task of audio inpainting. The proposed method uses an unconditionally trained generative model, which can be conditioned in a zero-shot fashion for audio inpainting, offering high flexibility to regenerate gaps of arbitrary length. An improved deep neural network architecture based on the constant-Q transform, which allows the model to exploit pitch-equivariant symmetries in audio, is also presented. The performance of the proposed algorithm is evaluated through objective and subjective metrics for the task of reconstructing short to mid-sized gaps. The results of a formal listening test show that the proposed method delivers a comparable performance against state-of-the-art for short gaps, while retaining a good audio quality and outperforming the baselines for the longest gap lengths tested, 150\;ms and 200\;ms. This work helps improve the restoration of sound recordings having fairly long local disturbances or dropouts, which must be reconstructed.Comment: Submitted for publication to the Journal of Audio Engineering Society on January 30th, 202

    Efficient target-response interpolation for a graphic equalizer

    Get PDF
    Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, held in Shanghai (China) during 20-25 March 2016.A graphic equalizer is an adjustable filter in which the command gain of each frequency band is practically independent of the gains of other bands. Designing a graphic equalizer with a high precision requires evaluating a target response that interpolates the magnitude response at several frequency points between the command gains. Good accuracy has been previously achieved by using polynomial interpolation methods such as cubic Hermite or spline interpolation. However, these methods require large computational resources, which is a limitation in real-time applications. This paper proposes an efficient way of computing the target response without sacrificing the approximation accuracy. This new approach called Linear Interpolation with Constant Segments (LICS) reduces the computing time of the target response by 55% and has an intrinsic parallel structure. Performance of the LICS method is assessed on an ARM Cortex-A7 core, which is commonly used in embedded systems.This work was conducted in spring 2015 when the first author was a visiting postdoctoral researcher at Aalto University. This research has been partly funded by the TIN2014-53495-R and TIN2011-23283 projects of the Ministerio de EconomĂ­a y Competitividad and FEDER

    EQ

    Get PDF
    Ekvalisointia kĂ€ytetÀÀn akustiikassa ja audiotekniikassa laajasti esimerkiksi ÀÀnentoistojĂ€rjestelmĂ€n taajuusvasteen korjaamiseen. Ekvalisaattorien (EQ) suunnittelu on kehittynyt paljon viime vuosina. TĂ€ssĂ€ artikkelissa keskitymme graaïŹsiin ekvalisaattoreihin, joiden suunnittelu on haastavaa. Esittelemme kaksi periaatetta ekvalisaattorin toteuttamiseen, perĂ€kkĂ€is- ja rinnaisrakenteen. KehittĂ€mĂ€mme uusimmat graaïŹset ekvalisaattorit tĂ€yttĂ€vĂ€t kriittisen hiïŹ-vaatimuksen, jonka mukaan taajuusvasteen tulee vastata asetuksia yhden desibelin tarkkuudella. GraaïŹsen ekvalisaattorin perĂ€kkĂ€israkenteessa se onnistuu valitsemalla tarkoituksenmukainen parametrinen suodin jokaiselle kaistalle, sÀÀtĂ€mĂ€llĂ€ niiden kaistanleveys siten, ettĂ€ vahvistuksen vaikutus viereisille kaistoille tunnetaan, ja ratkaisemalla kaistasuotimien vahvistukset pienimmĂ€n neliösumman menetelmĂ€llĂ€. Tarkka ja tehokas rinnakkainen graaïŹnen ekvalisaattori saadaan muuntamalla perĂ€kkĂ€israkenne viivĂ€stettyyn rinnakkaismuotoon, joka on uutuus tĂ€llĂ€ alalla.Koska nĂ€illĂ€ menetelmillĂ€ suunniteltujen oktaavi- ja terssiekvalisaattorien parametrien pĂ€ivitys vaatii paljon laskentaa, olemme korvanneet vahvistusten optimoinnin keinotekoisen hermoverkon avulla. KehittĂ€miemme menetelmien ansiosta graaïŹsen oktaavi- ja terssiekvalisaattorin suunnitteluongelma on nyt kĂ€ytĂ€nnössĂ€ ratkaistu.Non peer reviewe

    Solving Audio Inverse Problems with a Diffusion Model

    Full text link
    This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency axis represents pitch equivariance as translation equivariance. The proposed method is evaluated with objective and subjective metrics in three different and varied tasks: audio bandwidth extension, inpainting, and declipping. The results show that CQT-Diff outperforms the compared baselines and ablations in audio bandwidth extension and, without retraining, delivers competitive performance against modern baselines in audio inpainting and declipping. This work represents the first diffusion-based general framework for solving inverse problems in audio processing.Comment: Submitted to ICASSP 202

    Real-time emulation of the Clavinet

    Get PDF
    none3siopenLeonardo Gabrielli, Vesa VÀlimÀki, Stefan BilbaoGabrielli, Leonardo; VÀlimÀki, Vesa; Bilbao, Stefa

    Zero-Shot Blind Audio Bandwidth Extension

    Full text link
    Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Adversarial Guitar Amplifier Modelling With Unpaired Data

    Full text link
    We propose an audio effects processing framework that learns to emulate a target electric guitar tone from a recording. We train a deep neural network using an adversarial approach, with the goal of transforming the timbre of a guitar, into the timbre of another guitar after audio effects processing has been applied, for example, by a guitar amplifier. The model training requires no paired data, and the resulting model emulates the target timbre well whilst being capable of real-time processing on a modern personal computer. To verify our approach we present two experiments, one which carries out unpaired training using paired data, allowing us to monitor training via objective metrics, and another that uses fully unpaired data, corresponding to a realistic scenario where a user wants to emulate a guitar timbre only using audio data from a recording. Our listening test results confirm that the models are perceptually convincing

    Five Variations on a Feedback Theme

    Get PDF
    This is a study on a set of feedback amplitude modulation oscillator equations. It is based on a very simple and inexpensive algorithm which is capable of generating a complex spectrum from a sinusoidal input. We examine the original and five variations on it, discussing the details of each synthesis method. These include the addition of extra delay terms, waveshaping of the feedback signal, further heterodyning and increasing the loop delay. In complement, we provide a software implementation of these algorithms as a practical example of their application and as demonstration of their potential for synthesis instrument design
    • 

    corecore